Compare Page

Statistical validity

Characteristic Name: Statistical validity
Dimension: Validity
Description: Computed data must be statistically valid
Granularity: Information object
Implementation Type: Process-based approach
Characteristic Type: Usage

Verification Metric:

The number of tasks failed or under performed due to lack of statistical validity in data
The number of complaints received due to lack of statistical validity of data

GuidelinesExamplesDefinitons

The implementation guidelines are guidelines to follow in regard to the characteristic. The scenarios are examples of the implementation

Guidelines: Scenario:
Establish the population of interest unambiguously with appropriate justification (maintain documentation) (1) Both credit customers and cash customers are considered for a survey on customer satisfaction.
Establish an appropriate sampling method with appropriate justification (1) Stratified sampling is used to investigate drug preference of the medical officers
Establish statistical validity of samples -avoid over coverage and under coverage (maintain documentation) (1) Samples are taken from all income levels in a survey on vaccination
Maintain consistency of samples in case longitudinal analysis is performed. (Maintain documentation) (1) Same population is used over the time to collect epidemic data for a longitudinal analysis
Ensure that valid statistical methods are used to enable valid inferences about data, valid comparisons of parameters and generalise the findings. (1) Poisson distribution is used to make inferences since data generating events are occurred in a fixed interval of time and/or space
Ensure that the acceptable variations for estimated parameters are established with appropriate justifications (1) 95% confidence interval is used in estimating the mean value
Ensure that appropriate imputation measures are taken to nullify the impact of problems relating to outliers, data collection and data collection procedures and the edit rules are defined and maintained. (1) Incomplete responses are removed from the final data sample

Validation Metric:

How mature is the process to maintain statistical validity of data

These are examples of how the characteristic might occur in a database.

Example: Source:
if a column should contain at least one occurrence of all 50 states, but the column contains only 43 states, then the population is incomplete. Y. Lee, et al., “Journey to Data Quality”, Massachusetts Institute of Technology, 2006.

The Definitions are examples of the characteristic that appear in the sources provided.

Definition: Source:
Coherence of data refers to the internal consistency of the data. Coherence can be evaluated by determining if there is coherence between different data items for the same point in time, coherence between the same data items for different points in time or coherence between organisations or internationally. Coherence is promoted through the use of standard data concepts, classifications and target populations. HIQA 2011. International Review of Data Quality Health Information and Quality Authority (HIQA), Ireland. http://www.hiqa.ie/press-release/2011-04-28-international-review-data-quality.
1) Accuracy in the general statistical sense denotes the closeness of computations or estimates to the exact or true values.

2) Coherence of statistics is their adequacy to be reliably combined in different ways and for various uses.

LYON, M. 2008. Assessing Data Quality ,
Monetary and Financial Statistics.
Bank of England. http://www.bankofengland.co.uk/
statistics/Documents/ms/articles/art1mar08.pdf.

 

Ease of data access

Characteristic Name: Ease of data access
Dimension: Availability and Accessability
Description: Data should be easily accessible in a form that is suitable for its intended use.
Granularity: Information object
Implementation Type: Process-based approach
Characteristic Type: Usage

Verification Metric:

The number of tasks failed or under performed due to lack of ease in data access
The number of complaints received due to lack of ease in data access

GuidelinesExamplesDefinitons

The implementation guidelines are guidelines to follow in regard to the characteristic. The scenarios are examples of the implementation

Guidelines: Scenario:
Routinely accessed information to continue operations, should be automatically delivered to stakeholders online without wasting their time to search for it. (1) Daily exchange rates are linked into the accounting application or maintained in a dash board on the accountants desktop.

(2) Production efficiency is made available on a display board in the production floor.

Information needed for management reporting purposes should be identified and catered through built in reports where the users do not have to create the reports themselves. (1) Order status is frequently searched information by different stake holder groups and hence a report is made available with multiple searching criteria.
Facilitate users by providing tools to query the database without using any specific technical knowledge and perform business analytics to bring innovation (1) Technical infrastructure supports the users to develop their own reports based on dynamic information needs without consulting technical staff.
Facilitate the user to filter the relevant information depending on the need. (1) Sales report with filtering criteria for customer and date range.
The interfaces and reports should be created conveniently the users do not have to write complex queries or further process information before usage. (1) Product prices are ordered as per "Relevance" or "Price" to enable an e-commerce customer on a purchase decision

Validation Metric:

How mature is the process of maintaining ease in data access

These are examples of how the characteristic might occur in a database.

Example: Source:
Consider a database containing orders from customers. A practice for handling complaints and returns is to create an “adjustment” order for backing out the original order and then writing a new order for the corrected information if applicable. This procedure assigns new order numbers to the adjustment and replacement orders. For the accounting department, this is a high-quality database. All of the numbers come out in the wash. For a business analyst trying to determine trends in growth of orders by region, this is a poor-quality database. If the business analyst assumes that each order number represents a distinct order, his analysis will be all wrong. Someone needs to explain the practice and the methods necessary to unravel the data to get to the real numbers (if that is even possible after the fact). J. E. Olson, “Data Quality: The Accuracy Dimension”, Morgan Kaufmann Publishers, 9 January 2003.

The Definitions are examples of the characteristic that appear in the sources provided.

Definition: Source:
Accessibility refers to the physical conditions in which users can obtain data Clarity refers to the data’s information environment including appropriate metadata. LYON, M. 2008. Assessing Data Quality ,
Monetary and Financial Statistics.
Bank of England. http://www.bankofengland.co.uk/
statistics/Documents/ms/articles/art1mar08.pdf.
Speed and ease of locating and obtaining an information object relative to a particular activity STVILIA, B., GASSER, L., TWIDALE, M. B. & SMITH, L. C. 2007. A framework for information quality assessment. Journal of the American Society for Information Science and Technology, 58, 1720-1733.
Data are available or easily or quickly retrieved. WANG, R. Y. & STRONG, D. M. 1996. Beyond accuracy: What data quality means to data consumers. Journal of management information systems, 5-33.